Goto

Collaborating Authors

 vc class


Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness

Blanchard, Moïse, Shetty, Abhishek, Rakhlin, Alexander

arXiv.org Machine Learning

Understanding minimal assumptions that enable learning and generalization is perhaps the central question of learning theory. Several celebrated results in statistical learning theory, such as the VC theorem and Littlestone's characterization of online learnability, establish conditions on the hypothesis class that allow for learning under independent data and adversarial data, respectively. Building upon recent work bridging these extremes, we study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$ and ask when such problems are learnable with sample complexity that behaves like the favorable independent case. We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness i.e. a distribution family admits VC-dimension-dependent regret bounds for every finite-VC hypothesis class if and only if it is generalized smooth. Further, we give universal algorithms that achieve low regret under any generalized smooth adversary without explicit knowledge of $U$. Finally, when $U$ is known, we provide refined bounds in terms of a combinatorial parameter, the fragmentation number, that captures how many disjoint regions can carry nontrivial mass under $U$. These results provide a nearly complete understanding of learnability under distributional adversaries. In addition, building upon the surprising connection between online learning and differential privacy, we show that the generalized smoothness also characterizes private learnability under distributional constraints.





TowardsaUnified Information-Theoretic FrameworkforGeneralization

Neural Information Processing Systems

Let D be an unknown distribution on a spaceZ, and let H be a set of classifiers. Consider a (randomized) learning algorithmA = (An)n 1 that selects an elementˆh in H, based onn i.i.d.


On the Recursive Teaching Dimension of VC Classes

Neural Information Processing Systems

The recursive teaching dimension (RTD) of a concept class $C \subseteq \{0, 1\}^n$, introduced by Zilles et al. [ZLHZ11], is a complexity parameter measured by the worst-case number of labeled examples needed to learn any target concept of $C$ in the recursive teaching model. In this paper, we study the quantitative relation between RTD and the well-known learning complexity measure VC dimension (VCD), and improve the best known upper and (worst-case) lower bounds on the recursive teaching dimension with respect to the VC dimension. Given a concept class $C \subseteq \{0, 1\}^n$ with $VCD(C) = d$, we first show that $RTD(C)$ is at most $d 2^{d+1}$. This is the first upper bound for $RTD(C)$ that depends only on $VCD(C)$, independent of the size of the concept class $|C|$ and its~domain size $n$.


comments and criticism

Neural Information Processing Systems

We would like to thank the reviewers for their comments and helpful suggestions. We respond below to the reviewers' R1: The term "privacy model" is confusing: We will definitely think of a better term. R1: More discussion of the broader impact" We will elaborate more on the potential impact of the proposed We believe it can be a basis for a more flexible, more expressive framework for DP learning. R3: "Intuitively, the sample complexity should be comparable to that of the non-private one": We disagree with the R3: "I'd expect some simple but a bit more advance model, like a Bernoulli model . . . We also want to point out that the label-determined model does capture some realistic scenarios.



Smoothed Analysis of Sequential Probability Assignment

Neural Information Processing Systems

Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension.